Integration of Heterogeneous Information Sources

نویسندگان

  • Yakov Kogan
  • Yehoshua Sagiv
  • Yaron Kanza
  • David Michaeli
  • Werner Nutt
چکیده

This thesis presents a framework for integrating information from multiple heterogeneous sources. The framework consists of an HTML extension called OHTML and the W3Graph query language. OHTML adds to HTML semantic tags. The W3Graph query language is a query language for semistructured data that we designed for querying OHTML pages. We designed OHTML to enrich HTML pages with semantic information describing the contents of the pages. The OHTML format facilitates expressing the metadata (information about information) by providing a mechanism for embeding semantic tags in HTML pages. These semantic tags describe OEM data structures presented in HTML documents. The data structures do not have an absolute schema xed in advance. This is the reason why the information described in such a way is called semistructured. Formally, the data structure corresponds to the Object-Exchange Model (OEM) described in [PGMW95]. We present the W3Graph query language for querying OHTML pages. The queries in our language are de ned by means of graphs. Such approach provides the important advantages of conciseness and clarity, espessialy when de ning queries that involve several path expressions. Another important feature of the W3Graph query language is that it allows restructuring of query results according to user needs and producing answers as sets of OHTML documents. We de ne the notion of a maximal assignment and we divide constraints in two di erent ways. First, constraints are either weak or strong, depending on how their semantics is de ned in the presence of nulls. Second, constraints are either search or lter constraints, depending on how they are used to form the result of a query. Enriching the language with these constructions allows us to handle e eciently and in a conceptually convenient way missing or incomplete information. The role of maximal assignments is very close to the role of full disjunctions de ned in [GL94]. We describe an algorithm for computing maximal assignments. The running time of the algorithm is polynomial in the size of the input the output. The thesis describes also the W3Graph system. The system implements the W3Graph query language and the query-processing algorithm. The system provides users with querying OHTML pages and publishing restructured results in OHTML format. The system has been written entirely in Java as a combination of a server and a client. The client runs as an applet in a Web browser. ii

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

Information Integration from Heterogeneous Web Sources

Information integration is the problem of providing a unified and transparent view to a collection of data stored in multiple, autonomous and heterogeneous data sources. In this study we are focusing only on the inter organization information integration (web), as to different web sources has different data base management systems for the data management and different DBMS’s has their own synta...

متن کامل

Object Exchange Across Heterogeneous Information Sources

We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem di ers from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We de ne an object-based information exchange model and a corresponding query language that we believe are well suited...

متن کامل

Dynamic Creation of Multimedia Web Views on Heterogeneous Information Sources

Integration of the World Wide Web and other information sources is strongly required in the recent advanced application environments. Although information sources contain various types of data objects, the volume of multimedia objects has been increasing drastically. This paper proposes a multimedia integration scheme using SMIL in a mediator-based integration system. The scheme achieves dynami...

متن کامل

Information Extraction and Integration from Heterogeneous, Distributed, Autonomous Information Sources : A Federated Ontology-Driven Query-Centric Approach

This paper motivates and describes the data integration component of INDUS (Intelligent Data Understanding System) environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to ac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007